skip to main content


Search for: All records

Creators/Authors contains: "Anson, Ian G"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Background: Text recycling (hereafter TR)—the reuse of one’s own textual materials from one document in a new document—is a common but hotly debated and unsettled practice in many academic disciplines, especially in the context of peer-reviewed journal articles. Although several analytic systems have been used to determine replication of text—for example, for purposes of identifying plagiarism—they do not offer an optimal way to compare documents to determine the nature and extent of TR in order to study and theorize this as a practice in different disciplines. In this article, we first describe TR as a common phenomenon in academic publishing, then explore the challenges associated with trying to study the nature and extent of TR within STEM disciplines. We then describe in detail the complex processes we used to create a system for identifying TR across large corpora of texts, and the sentence-level string-distance lexical methods used to refine and test the system (White & Joy, 2004). The purpose of creating such a system is to identify legitimate cases of TR across large corpora of academic texts in different fields of study, allowing meaningful cross-disciplinary comparisons in future analyses of published work. The findings from such investigations will extend and refine our understanding of discourse practices in academic and scientific settings. Literature Review: Text-analytic methods have been widely developed and implemented to identify reused textual materials for detecting plagiarism, and there is considerable literature on such methods. (Instead of taking up space detailing this literature, we point readers to several recent reviews: Gupta, 2016; Hiremath & Otari, 2014; and Meuschke & Gipp, 2013). Such methods include fingerprinting, term occurrence analysis, citation analysis (identifying similarity in references and citations), and stylometry (statistically comparing authors’ writing styles; see Meuschke & Gipp, 2013). Although TR occurs in a wide range of situations, recent debate has focused on recycling from one published research paper to another—particularly in STEM fields (see, for example, Andreescu, 2013; Bouville, 2008; Bretag & Mahmud, 2009; Roig, 2008; Scanlon, 2007). An important step in better understanding the practice is seeing how authors actually recycle material in their published work. Standard methods for detecting plagiarism are not directly suitable for this task, as the objective is not to determine the presence or absence of reuse itself, but to study the types and patterns of reuse, including materials that are syntactically but not substantively distinct—such as “patchwriting” (Howard, 1999). In the present account of our efforts to create a text-analytic system for determining TR, we take a conventional alphabetic approach to text, in part because we did not aim at this stage of our project to analyze non-discursive text such as images or other media. However, although the project adheres to conventional definitions of text, with a focus on lexical replication, we also subscribe to context-sensitive approaches to text production. The results of applying the system to large corpora of published texts can potentially reveal varieties in the practice of TR as a function of different discourse communities and disciplines. Writers’ decisions within what appear to be canonical genres are contingent, based on adherence to or deviation from existing rules and procedures if and when these actually exist. Our goal is to create a system for analyzing TR in groups of texts produced by the same authors in order to determine the nature and extent of TR, especially across disciplinary areas, without judgment of scholars’ use of the practice. 
    more » « less